Small scale behavior of financial data 
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A new approach is presented to describe the change in the statistics of the log return distribution 
of financial data as a function of the timescale. To this purpose a measure is introduced, which 
quantifies the distance of a considered distribution to a reference distribution. The existence of a 
small timescale regime is demonstrated, which exhibits different properties compared to the normal 
timescale regime for timescales larger than one minute. This regime seems to be universal for 
individual stocks. It is shown that the existence of this small timescale regime is not dependent 
on the special choice of the distance measure or the reference distribution. These findings have 
important implications for risk analysis, in particular for the probability of extreme events. 

PACS numbers: 89.65.Gh 



I. INTRODUCTION 

The origin of mathematical finance dates back to 
Bachelier's famous thesis Theorie De La Speculation (see 
0). As a central point in this work a normal distribu- 
tion was assumed for financial returns. This assumption 
was for several reasons later changed by other authors to 
a normal distribution for the log return r ,2|. The log 
return r is defined in the following way: 



r(r) :=log(P(t + T))-log(P(t)) 



(1) 



where P(t) denotes the price of the investment at time 
t. For alternative distributions to the log return distri- 
bution we refer to @ H H S 0, H- Other attempts 
focused on the mechanism that may produce such 
distributions. There remains a general problem to de- 
termine the correct family of distributions, based on an 
appropriate underlying stochastic process, incorporating 
the dependence of the shape of the distribution on the 
timescale. 

In the following we focus on the distribution (or the 
so called probability density function - pdf), which is in 
general dependent on the value of the log return itself as 
well as on the considered timescale. The question of the 
dependence of the shape of the distribution on the time 
scale was already posed in 2j. Considering changes of 
the form of distributions requires to distinguish between 
changes due to the mean value, due to the standard devi- 
ation, see e.g. 01 > an d due to the shape, see e.g. [rHll2|| . 
A discussion of the importance of risk measures like VaR 
and their connection to the underlying distribution can 
be found in [HQ. 

When considering individual stocks, for very large time 
scales the normalized distribution is quite similar to 
a Gaussian distribution. For small timescales a Non- 
Gaussian fat-tailed distribution is obtained. An inter- 
esting question now arises. Is this transition from a fat- 
tailed distribution towards a Gaussian a smooth and uni- 
form process? A general non-parametric method, utiliz- 
ing a Fokker-Planck equation in timescale, has been pro- 
posed, which provides a general description of how the 
shape of the distribution evolves with changing timescale 



|15| . Although this approach is very general, it is based 
on assumptions that are partially no longer fulfilled for 
very small time scales (typically smaller than several min- 
utes). Therefore here a specific non-parametric approach 
is presented, which provides insight into timescales cov- 
ering seconds and minutes. 



II. 



DATA 



In this study tick-by-tick data sets are used, in order 
to cover timescales as small as possible. The financial 
data sets were provided by the Karlsruher Kapitalmarkt 
Datenbank (KKMDB) 16]. The data sets contain all 
transactions on IBIS and XETRA in the corresponding 
period. The data sets used in this study span from the be- 
ginning of 1993 till the end of 2003 and contain 3 - 4 • 10 6 
data points. Only stocks with a continuous history of 
trading in this period are considered. Results are pre- 
sented for the three stocks with the largest number of 
trades in this period. These three stocks are Bayer, Volk- 
swagen(VW) and Allianz. In order to investigate changes 
of the shape of the distribution, we analyze in general 
normalized distributions and therefore look at the nor- 
malized return variable R 



R = 



V 1 t 2 — r 2 



(2) 



where the average is taken over the whole data set. In 
order to compare the findings for stocks to other systems, 
the same analysis is performed for a turbulence data set. 
The data set was obtained by measuring the local lon- 
gitudinal and transversal velocity component of a fluid 
in the turbulent wake behind a cylinder with a Taylor- 
based Reynolds number of 180 and contains 31 • 10 6 data 
points. For more details see [l7| . 



III. METHOD 

A non-parametric approach to the detection of a 
change in shape of a distribution is a direct measure- 



2 



ment of the distance between two distributions. p re f(R) 
denotes the distribution for a reference timescale and 
p(r, R) the distribution for another timescale. Firstly this 
allows verification of the frequently proposed assumption 
of a constant shape with respect to the timescale. Sec- 
ondly if the shape is not constant this provides a quanti- 
tative measure of the size of the change in the shape of 
the distribution. Therefore a measure is needed to quan- 
tify the distance between two distributions. Here, the 
Kullback-Leiber-Entropy is used, which is defined as [Isj 

+00 

d K (p(r,R),p ref (R)) := J dR p{r, R) ■ In (^S) -( 3 ) 

—00 

In order to demonstrate the independence of our results 
on the particular choice of the measure we also use the 
weighted mean square error in logarithmic space 

d M (p(T,R),p ref (R)):= (4) 
/ dR ( P (t, R) + Pref (R) ) (In p(r, R) - In Pref (i?) f 

— OO 

+OO 

J dR{p(T,R)+ Pref (R)) (In 2 p(r, R) + In 2 p ref (R) ) 

— OO 

Furthermore the chi-square distance is used as a third 
measure 

+00 

/ dR(p(T,R)- P ref(R)) 2 

d c (p(T,R),p ref (R)) := — — . (5) 

/ dRp ref (R) 

—00 

Using these distance measures it is possible to determine 
the distance of a log return distribution calculated for a 
certain timescale from a reference distribution. 



IV. EVIDENCE OF A NEW UNIVERSAL 
SMALL TIMESCALE REGIME 

For very large timescales the distribution is quite close 
to a Gaussian, therefore the Gaussian distribution is 
taken as reference distribution. In Fig. f^thc Kullback- 
Leiber distance to the Gaussian distribution for three 
individual stocks is shown. It is evident, that the be- 
havior changes considerably for timescales smaller than 
100s. For such small scales the pdfs of financial data are 
considerably different from the Gaussian distribution. 

In a second step the distribution of the smallest scale of 
the considered asset is chosen as a reference distribution. 
In Fig. |3 the distance dx to the smallest timescale for 
the three stocks is shown, together with the one sigma 
error (dotted lines). The error estimate was calculated 
by means of sub-samples of the data set to estimate the 
distribution of the distance measure. Again a transition 
behavior is seen, indicating a change in the stochastic 
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FIG. 1: Kullback-Leiber distance to the Gaussian distribu- 
tion for three stocks. 



behavior in the region 10s -100s. In all three cases the 
first region may be characterized by a linear increase of 
the distance measure dx- The linear fit for this first 
region is drawn as a solid line in Fig. [2 (Note the use of 
semilog plots). 

In order to verify if the region displaying linear be- 
havior is dependent on the chosen reference timescale, 
the analysis has been redone for different reference 
timescales. As an illustration, the results for Volkswagen 
are shown in Fig. For all these reference distributions 
the extent of the linear region (more precisely the upper 
bound) does not change. This and similar results for the 
other assets indicate, that the linear region is indepen- 
dent of the timescale that was chosen for the reference 
distribution. Next we discuss the influence of different 
distance measures (Eqs. ©-©). As an example the dis- 
tance to the smallest timescale for VW is shown in Fig. 
|3Jd. Similar results were obtained for other stocks. For 
comparison all distance measures were rescaled to the in- 
terval [0, 1] in Fig. |3Jd. For all three distance measures 
a division of the timescale in two parts characterized by 
the different functional behavior in these parts is evident. 

A possible reason for the existence of different domains 
may be based on a specific relationship between consec- 
utive increments on different timescales. One way to an- 
alyze this is to destroy all possible causal relationships of 
consecutive increments. This can be done by permuting 
all increments on a certain timescale (here the timescale 
of the reference distribution) and thereby creating a new 
time series with the same p re f(R). This new time se- 
ries exhibits for small timescales a logarithmic increase 
in the distance measure dx , see Fig. [3J;. Further there is 
no longer a division into two distinct timescale intervals 
with different functional behavior of the distance mea- 
sure dx ■ It is therefore evident, that the small timescale 
regime is due to functional relationships between consec- 
utive increments. In order to investigate if the depen- 
dencies are linear, the autocorrelation function (ACF) of 
the non-uniformly sampled time series is calculated, cf. 
1191. The estimator for the autocorrelation is defined in 
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FIG. 2: The distance measure d,K for a reference distribution p re f(R) := p(r = Is, R) for the individual stocks. The dots 
represent the estimated value, the dotted lines the one sigma error bound and the solid line the linear fit for the first region. 




timescale in sec timescale in sec timescale in sec 

FIG. 3: a) The distance measure dpc for Volkswagen, b) Three different distance measures with p re f(R) = p(r = ls,R) 
for Volkswagen, c) Comparison of the distance measure cLk with p re f(R) — p(r = Is, R) for the original and the permuted 
Volkswagen data set. 



the following way 

N N 



X;X>(T,f,)-f(r)] 

L z=l 3=1 



p(r, At) := 

x [r(r,^)-K^)]%-^) 



(6) 



N N 

^^[r 2 (r,t0-f 2 (r)]^-t i ) 

LLi=l3=l 
N N 

^^[r 2 (r,^)-f 2 (r)]6fe-^ 
L i=l j=l 



1 for \(tj -U) - Ar| < 6 At 
otherwise 



(7) 



where r(r,ti) is the log return on the timescale t at the 
time ti and 5 a small number. The results for the ACF, 
computed on a timescale of four seconds, are shown in 
Fig. 0]i+b. The computation of the ACF for smaller 
timescales becomes increasingly difficult due to the very 
small number of available log returns. In agreement with 
the literature [jj l2i| , there is a negative autocorrelation 
for the smallest lag, while for larger lags the ACF yields 
values very close to zero. The ACF of the magnitude of 
the log returns is considered in Fig. 2t>. Here there is a 
strong positive autocorrelation for the smallest lag, which 
slowly decays for larger lags. However, for both ACFs 



and all the considered stocks there is no indication of a 
small timescale regime in the ACF. It therefore appears 
that the functional relationship between consecutive in- 
crements, which causes the small timescale regime, is of 
nonlinear nature. 



V. COMPARISON WITH TURBULENCE DATA 

In ^3 an d HU it has been shown, that finance and 
turbulence data display common properties. The analy- 
sis described above is therefore also performed with tur- 
bulence data in order to see if a small timescale regime 
is present in that case as well. In Fig. [S] the distances 
dx of the distribution of the velocity increments with re- 
spect to the Gaussian distribution (a) and with respect 
to a small scale reference distribution (b and c) for the 
turbulence data are shown. The qualitative behavior for 
larger timescales is similar to that observed for individual 
stocks, while for smaller timescales the behavior differs. 
It is important to note the difference in scale of the dis- 
tance measure in Figs. Hand OIK- 



VI. APPLICATIONS 

How does the specific behavior of the small timescale 
regime translate into practical applications? The devi- 
ation from the Gaussian distribution is increasing much 
faster in the small timescale regime than in the normal 
timescale regime. A visual inspection shows, that the 
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FIG. 4: a) Autocorrelation function of the log returns for three individual stocks, b) Autocorrelation function of the magnitude 
of the log returns for three individual stocks. 
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FIG. 5: Turbulence data: a) The dependence of (Ik to the Gaussian distribution on the timescale. b+c) The distance measure 
da for a reference distribution p re f(R) := p(r — 4 • 10 _5 s, 7?). The dots represent the estimated value, the dotted lines the one 
sigma error bound. 



considered log return distributions deviate into the di- 
rection of fat-tailed distributions. Therefore the proba- 
bility mass in the tails of the distribution should increase 
faster by entering the small timescale regime. In order 
to analyze this, the probability mass in the tails of the 
distribution, i.e. the probability mass beyond the 10th 
standard deviation, where left and right tail are consid- 
ered together, is calculated and the results are compared 
to the distance measure dx- The reference timescale is 
one second. The results are shown in Fig. HJ1 In all three 
cases it is evident, that the change of the distance mea- 
sure corresponds to a change of probability mass in the 
tails of the distribution. In the small timescale regime 
the increase in the probability mass in the tails of the 
distribution is very pronounced. The estimates of prob- 
ability mass, for timescales larger than 10 3 s, are rather 
noisy, due to the effect that events are quite rare in this 
region. 



VII. CONCLUSIONS 

Summarizing, it has been demonstrated that the prop- 
erties of the log return distribution of stocks do not 
change uniformly if one goes to smaller timescales. In- 
stead, for small timescales a distinct regime is entered 
with different properties. In this small timescale regime, 
the shape of the distribution changes much faster than 
one would expect by extrapolating the behavior of the 
normal timescale regime. This small timescale regime 
extends for individual stocks to our knowledge from 



timescales of around Is to timescales of around 15s. The 
small timescale regime can be characterized by a lin- 
ear dependence of the Kullback-Leiber distance dx on 
the timescale, if as a reference distribution a log re- 
turn distribution on a very small timescale is chosen. 
In the normal timescale regime the dependence is much 
slower and can be assumed to be logarithmic or for very 
large timescales independent of the timescale. This re- 
sult seems to be independent of the chosen reference dis- 
tribution as long as it is a log return distribution on a 
sufficient small timescale. If the Gaussian distribution is 
taken as a reference distribution, dx is rising very fast 
with decreasing timescale in the small timescale regime, 
while it stays nearly constant in the normal timescale 
regime in accordance with |llj. This indicates a very 
fast deviation from a Gaussian-like shape in the small 
timescale regime. These results could be confirmed with 
different distance measures. Further it has been shown 
that this small timescale regime is a specific feature of 
the financial data investigated here. For turbulence data 
no such small timescale regime is observed, although fi- 
nancial and turbulence data sets exhibit similarities in 
the normal timescale regime. For very small timescales 
the shape of the distribution, in contrast to the find- 
ings for the financial data sets, changes slower than one 
would expect by extrapolating the behavior of the nor- 
mal timescale regime. Two prominent candidates for this 
effect are the dissipation on small scales for turbulence 
(T?| and the noise added by the measurement system. 
Furthermore the particular small timescale regime for 
individual stocks cannot be reproduced by trivial ran- 
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FIG. 6: Comparison between the probability mass beyond the 10th standard deviation (solid line) and dx (dotted line). 



domized data. As an application of this new approach very different risk characteristics in comparison to that 
it has been demonstrated that on entering the small of larger timescales. 
timescale regime, a large increase in the probability mass 
in the tails of the distribution occurs, which could lead to 
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